MaxTract: Converting PDF to LTEX, MathML and Text
نویسندگان
چکیده
In this paper we present the first public, online demonstration of MaxTract; a tool that converts PDF files containing mathematics into multiple formats including LTEX, HTML with embedded MathML, and plain text. Using a bespoke PDF parser and image analyser, we directly extract character and font information to use as input for a linear grammar which, in conjunction with specialised drivers, can accurately recognise and reproduce both the two dimensional relationships between symbols in mathematical formulae and the one dimensional relationships present in standard text. The main goals of MaxTract are to provide translation services into standard mathematical markup languages and to add accessibility to mathematical documents on multiple levels. This includes both accessibility in the narrow sense of providing access to content for print impaired users, such as those with visual impairments, dyslexia or dyspraxia, as well as more generally to enable any user access to the mathematical content at more re-usable levels than the merely visual. MaxTract produces output compatible with web browsers, screen readers, and tools such as copy and paste, which is achieved by enriching the regular text with mathematical markup. The output can also be used directly, within the limits of the presentation MathML produced, as machine readable mathematical input to software systems such as Mathematica or Maple.
منابع مشابه
Tagged mathematics in PDFs for accessibility and other purposes
PDF has been the preferred format for publishing mathematics for many years now. With changes to methods of delivery (i.e., electronic rather than predominantly paper) there need to be corresponding enhancements in the document format. Not least among these can be implicit legal obligations to satisfy Accessibility criteria. The answer developed for PDF is tagging of document structure and cont...
متن کاملPDF/A-3u as an Archival Format for Accessible Mathematics
Including LTEX source of mathematical expressions, within the PDF document of a text-book or research paper, has definite benefits regarding ‘Accessibility’ considerations. Here we describe three ways in which this can be done, fully compatibly with international standards ISO32000, ISO19005-3, and the forthcoming ISO32000-2 (PDF 2.0). Two methods use embedded files, also known as ‘attachments’...
متن کاملAugmenting Presentation MathML for Search
The ubiquity of text search is both a boon and bane for the quest for math search. A bane in that user’s expectations are high regarding accuracy, in-context highlighting and similar features. Yet also a boon with the availability of highly evolved search engine libraries; Youssef has previously shown how an appropriate ‘textualization’ of mathematics into an indexable form allows standard text...
متن کاملCEDRICS: When CEDRAM Meets Tralics
We describe CEDRICS, a general purpose system for automated journal production entirely based on a LTEX input format. We show how the very basic ideas that initiated the whole effort turned into an efficient system because of the ability of LTEX markup to parametrise simultaneously and without compromise high typographical quality for the PDF output as well as accurate XML metadata with (presen...
متن کاملA Reappraisal of Online Mathematics Teaching Using LaTeX
The mathematics language LTEX is often seen as a legacy technology that is awkward to use. MathML a verbose language designed for data-exchange, and to be written and understood by machines is by contrast seen as something that will aid online mathematics and lack of browser support for it bemoaned. However LTEX can already do many of the things that MathML might promise. LTEX is here proposed ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012